Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactor LatentGP API slightly #216

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

willtebbutt
Copy link
Member

@willtebbutt willtebbutt commented Aug 25, 2021

As discussed in #202

I don't want to merge this until we're ready to make another breaking release (I couldn't figure out how to do this without being breaking).

edit: my only reservation about this PR is that having to provide a function which maps to a function which maps to a distribution feels a bit complicated. Not sure whether there's a better approach. See the tests for an example -- it's just rather complicated for humans to interact with.

@@ -3,10 +3,9 @@
x = rand(10)
y = rand(10)

lgp = LatentGP(gp, x -> MvNormal(x, 0.1), 1e-5)
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor of MvNormal was deprecated (JuliaStats/Distributions.jl#1362), it should be replaced with e.g.

Suggested change
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5)
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.01 * I)), 1e-5)

@st--
Copy link
Member

st-- commented Aug 26, 2021

A Haskell coder would say that's just how functions work 😂 x -> f -> lik

But I do agree it's a bit cumbersome here. What was your motivation for doing it this way?

What do you think of typing up some concrete use-cases? (not the full implementation, just how the api usage might look)

I'm assuming you are thinking of some likelihood where location = gp(x) and scale = deterministic(x). How would we then handle location = gp1(x) and scale = gp2(x)?

@st--
Copy link
Member

st-- commented Aug 26, 2021

What would stop us from simply passing the inputs x directly to the likelihood object when calling it? The LatentFiniteGP already carries x with it, so we could extract it from there in rand and logpdf.

Edited to add: We could also define a trait for "wants to be passed the inputs" so all the likelihoods that don't actually care about x (most of them at this point!) wouldn't have to take a dummy argument.

@willtebbutt
Copy link
Member Author

willtebbutt commented Aug 26, 2021

But I do agree it's a bit cumbersome here. What was your motivation for doing it this way?

It just feels really clean / natural to mirror the FiniteGP structure. The idea that you have likelihoods which cover the entire space, and then you turn them into a finite-dimensional thing once you have some inputs.

What would stop us from simply passing the inputs x directly to the likelihood object when calling it? The LatentFiniteGP already carries x with it, so we could extract it from there in rand and logpdf.

I guess this relates to the above. This would also be a perfectly reasonable way to do this.

Whichever we went with, I think I would rather be consistent. Either we have

struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
    lgp::Tlgp
    x::Tx
end

and build everything on the fly (which is totally fine, it's all essentially zero-overhead), or we do

struct LatentFiniteGP{Tfx<:LatentGP, Tlikx}
    fx::Tfx
    likx::Tlikx
end

in which both fx and likx already have x baked in to them.

I can't say that I'm overly fussed either way. Do you have a preference @st-- ?

What do you think of typing up some concrete use-cases? (not the full implementation, just how the api usage might look)

I've not thought especially about how users will find using this -- we can always build utilities on top of the API provided that it supports everything that we need. In terms of low-level usage, I was imagining something like (as a heteroscedastic example):

f = GP(Matern52Kernel())
g = my_favourite_positive_valued_function
jitter = 1e-6
latent_gp = LatentGP(f, x -> (f -> MvNormal(f, map(g, x))), jitter)
elbo(approx, latent_gp(x), y)

or something qualitatively like this.

I'm assuming you are thinking of some likelihood where location = gp(x) and scale = deterministic(x). How would we then handle location = gp1(x) and scale = gp2(x)?

You read my mind. For example

f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f -> MvNormal(f(x[1:N]), map(exp, f(x[N+1:end])))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter)

This second example is really quite unpleasant, so we will definitely need utilities built on top of this to make it usable in practice, but hopefully the gist is clear. The point is just that we would now have the flexibility to express something like this.

(Incidentally, one of the things I like about this is that it doesn't privelidge mean-field-per-process approximate posteriors)

We could also define a trait for "wants to be passed the inputs" so all the likelihoods that don't actually care about x (most of them at this point!) wouldn't have to take a dummy argument.

I like this suggestion, as I agree that it would be sad if likelihoods that don't care about inputs had to care about them.

edit: the more that I think about it, the more that I like

struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
    lgp::Tlgp
    x::Tx
end

as the structure of the LatentFiniteGP. It mirrors the FiniteGP structure really nicely, and is very clean.

edit2: the only other reason to like the x -> f -> lik over (x, f) -> lik is that if you want to compute the likelihood for multiple values of f, there might be some computations that you can cache across all samples, if x is fixed. Not sure how much of a concern that is in practice, but we could presumably find an example where it matters a lot if we thought hard enough (e.g. throw something expensive like a neural network in the likelihood somewhere, and make it only depend on x)

@devmotion
Copy link
Member

edit2: the only other reason to like the x -> f -> lik over (x, f) -> lik is that if you want to compute the likelihood for multiple values of f, there might be some computations that you can cache across all samples, if x is fixed. Not sure how much of a concern that is in practice, but we could presumably find an example where it matters a lot if we thought hard enough (e.g. throw something expensive like a neural network in the likelihood somewhere, and make it only depend on x)

Not sure if it is a good idea but maybe one could dispatch on Base.Fix1{typeof(lik)} or more generally Base.Fix1{<:LikelihoodFunction} internally in a path that involves evaluations at the same x for multiple f.

@willtebbutt
Copy link
Member Author

Not sure if it is a good idea but maybe one could dispatch on Base.Fix1{typeof(lik)} or more generally Base.Fix1{<:LikelihoodFunction} internally in a path that involves evaluations at the same x for multiple f.

That would definitely do it. Do we know whether Fix1 is an official part of the API? I had always assumed it's not exported because the Julia devs don't really want people depending on it...

@st--
Copy link
Member

st-- commented Aug 27, 2021

@willtebbutt your two examples seem inconsistent (and that's why I wanted to see some mock code usage!): in your first example, f is the concrete evaluation of the function at x (e.g. rand(gp(x, jitter))! or GH quadrature points corresponding to the marginal distribution), and that makes sense to me. In the second example, however, it looks like f is the (non-Finite) GP object itself, which you then evaluate at different outputs. But that would mean you're passing a FiniteGP to the MvNormal. How would that work ?

@willtebbutt
Copy link
Member Author

willtebbutt commented Aug 27, 2021

You're completely correct. Good point. The second example should instead be something like

f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f::AbstractVector{<:Real} -> MvNormal(f[1:N], map(exp, f[N+1:end]))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter)

The point here is that the type of x tells you that the first N entries correspond to the first output, and the second N entries to the second (hence the need for x). You could also shove a consistency check in to ensure that length(x) == length(f), and to check that there are only two outputs in x. This does feel like something that has to be carefully orchestrated though.

@st--
Copy link
Member

st-- commented Aug 27, 2021

lik(x::MOInputIsotopicByOutput) = (f::AbstractVector{<:Real} -> MvNormal(f[1:N], map(exp, f[N+1:end]))

Made me wonder if perhaps we could unify the two isotopic MO inputs into a single parametric type, so that we can pass-through dispatch on the type of isotopy, e.g. MOInputIsotopic{T} where T <: Union{ByOutput, ByFeature}, and

lik(x::MOInputIsotopic{T}) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(T, f, 1), map(exp, get_mo_output(T, f, 2)))

? but actually for that to work you do need to know how many different outputs there are ... so maybe a MOInputIsotopic.type struct field that could then contain out_dim, and we have

lik(x::MOInputIsotopic) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(x.type, f, 1), map(exp, get_mo_output(x.type, f, 2)))

but it still seems pretty clunky.

And how would it work if you want to build a heterotopic multi-output heteroskedastic model, so if there are observations y1 and y2 which you may observe at different inputs, but wherever you observe y1 you need both f1 and f2, and where you observe y2 you also need f3 and f4? 😂

@willtebbutt
Copy link
Member Author

willtebbutt commented Aug 27, 2021

And how would it work if you want to build a heterotopic multi-output heteroskedastic model, so if there are observations y1 and y2 which you may observe at different inputs, but wherever you observe y1 you need both f1 and f2, and where you observe y2 you also need f3 and f4? 😂

Yeah, this is a good point. Perhaps this could be achieved by nesting multi-output models? e.g.

GP(LinearMixingModelKernel([
    LinearMixingModelKernel(kernels_for_mean_processes),
    LinearMixingModelKernel(kernels_for_variance_processes),
]))

Might need to be nested the other way around, but hopefully the idea is clear.

An input for this would be something like

julia> feature_1, feature_2, feature_3 = 1.0, 2.0, 3.0
(1.0, 2.0, 3.0)

julia> MOInputIsotopicByOutputs([(feature_1, 1), (feature_2, 2), (feature_3, 1)], 2)
6-element MOInputIsotopicByOutputs{Tuple{Float64, Int64}, Vector{Tuple{Float64, Int64}}}:
 ((1.0, 1), 1)
 ((2.0, 2), 1)
 ((3.0, 1), 1)
 ((1.0, 1), 2)
 ((2.0, 2), 2)
 ((3.0, 1), 2)

or whatever.

Made me wonder if perhaps we could unify the two isotopic MO inputs into a single parametric type, so that we can pass-through dispatch on the type of isotopy, e.g. MOInputIsotopic{T} where T <: Union{ByOutput, ByFeature}, and

This could make sense. I wonder what the tradeoffs are here vs the supertype option that we've discussed before?

edit: I missed the mixing matrices from the kernels above. I was imagining they'd just be the identity matrix here, although maybe there are situations in which you want your noise variance to be proportional to the mean...

@theogf
Copy link
Member

theogf commented Aug 27, 2021

What about having an InputDependentLikelihood which takes a function of the form you proposed:

struct InputDependentLikelihood{F}
   f::F
end
to_likelihood(lik, x) = lik
to_likelihood(lik::InputDependentLikelihood, x) = lik.f(x)
LatentFiniteGP(lgp.f(x, lgp.Σy), to_likelihood(lgp.lik, x))

since input dependent likelihoods are more "rare" that would make more sense to need an extra effort to make one?

@st--
Copy link
Member

st-- commented Oct 11, 2021

NB- lgp.Σy seems rather misleading; it's got nothing to do with y: could we more explicitly just call the field jitter instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants